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Fig. 1. Computer-generated holography (CGH) results captured with a display prototype that uses a fast, low-precision (i.e., 4 bit) phase spatial light modulator 
(SLM). When supervised with 2.5D RGBD images, our approach (2nd column) provides a better image quality than the state-of-the-art neural 3D holography 
algorithm [Choi et al. 2021a] (1st column) using this low-precision SLM. Our CGH framework is flexible in not only enabling 2.5D but also 3D focal stack and 
AD light field supervision. The former approach (3rd column) results in the best in-focus (red boxes) and out-of-focus (white boxes) image quality among 2.5D 
and 3D CGH algorithms. Our 4D light field-supervised approach (5th column) outperforms the recently proposed OLAS method [Padmanaban et al. 2019] 
(4th column) by a large margin and utilizes the space—bandwidth product more effectively, as shown by the simulated light fields in the lower right images. 


Holographic near-eye displays offer unprecedented capabilities for virtual 


and augmented reality systems, including perceptually important focus cues. 


Although artificial intelligence—driven algorithms for computer-generated 
holography (CGH) have recently made much progress in improving the 
image quality and synthesis efficiency of holograms, these algorithms are 
not directly applicable to emerging phase-only spatial light modulators 


“denotes equal contribution. 


Authors’ addresses: Suyeon Choi, suyeon@stanford.edu, Stanford University, USA; 
Manu Gopakumar, manugopa@stanford.edu, Stanford University, USA; Yifan Peng, 
evanpeng@stanford.edu, Stanford University, USA; Jonghyun Kim, jonghyunk@nvidia. 
com, NVIDIA and Stanford University, USA; Matthew O’Toole, mpotoole@cmu.edu, 
Carnegie Mellon University, USA; Gordon Wetzstein, gordon.wetzstein@stanford.edu, 
Stanford University, USA. 


Permission to make digital or hard copies of all or part of this work for personal or 
classroom use is granted without fee provided that copies are not made or distributed 
for profit or commercial advantage and that copies bear this notice and the full citation 
on the first page. Copyrights for components of this work owned by others than the 
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or 
republish, to post on servers or to redistribute to lists, requires prior specific permission 
and/or a fee. Request permissions from permissions@acm.org. 

© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM. 
0730-0301/2022/00-ART00 $15.00 

https://doi.org/0000.0000 


(SLM) that are extremely fast but offer phase control with very limited 
precision. The speed of these SLMs offers time multiplexing capabilities, 
essentially enabling partially-coherent holographic display modes. Here we 
report advances in camera-calibrated wave propagation models for these 
types of holographic near-eye displays and we develop a CGH framework 
that robustly optimizes the heavily quantized phase patterns of fast SLMs. 
Our framework is flexible in supporting runtime supervision with different 
types of content, including 2D and 2.5D RGBD images, 3D focal stacks, and 
4D light fields. Using our framework, we demonstrate state-of-the-art results 
for all of these scenarios in simulation and experiment. 
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methodologies — Computer graphics. 
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1 INTRODUCTION 


Holographic near-eye displays for virtual and augmented reality 
(VR/AR) applications offer many benefits to wearable computing 
systems over conventional microdisplays. These include high peak 
brightness, power efficiency, support of perceptually important fo- 
cus cues and vision-correcting capabilities [Kim et al. 2021], as well 
as thin device form factors [Kim et al. 2022; Maimone and Wang 
2020]. Yet, the image quality achieved by computer-generated holog- 
raphy (CGH) lags far behind that of conventional displays, requiring 
further advancements in the algorithms driving holographic dis- 
plays. 

Recently, artificial intelligence (AI) methods have enabled signif- 
icant improvements in image quality [Chakravarthula et al. 2020; 
Choi et al. 2021a; Peng et al. 2020] and speed [Horisaki et al. 2018; 
Peng et al. 2020; Shi et al. 2021] of holographic displays. These 
algorithms, however, are primarily applicable to slow liquid crystal— 
based (LC) spatial light modulators (SLMs) that offer control of 
the phase of a coherent light source at high precision. Emerging 
micro-electromechanical (MEMS) phase SLMs [Bartlett et al. 2019] 
offer potential benefits over LC-based systems in being more light 
efficient, significantly faster, better suited to operate across a wide 
range of wavelengths, and more stable for varying temperatures. 
Indeed, MEMS-based amplitude SLMs are one of the most popular 
technology choices for many display applications, including pro- 
jectors, so MEMS-based phase SLMs may also become increasingly 
important for holography applications. Unfortunately, the algo- 
rithms developed for high-precision LC-based phase SLMs suffer 
from a degradation in image quality and fail to fully utilize time- 
multiplexing when used with the high framerate, heavily quantized 
phase control that MEMS-based SLMs offer. For example, DLP’s 
phase SLM by Texas Instruments only offers up to 4 bits of precision 
or, similarly, 16 unevenly distributed discrete levels of phase control 
at frame rates of 1440 Hz [Bartlett et al. 2019; Ketchum and Blanche 
2021). 

The focus of our work is to extend Al-driven CGH algorithms 
to operate with emerging fast but heavily quantized phase SLMs. 
This is a non-trivial task, because quantization is non-differentiable, 
so the standard machine learning toolset does not directly apply 
in these settings. Moreover, most of the degrees of freedom of a 
holographic display stem from their ability to create constructive 
and destructive interference, which can only be achieved instanta- 
neously in time but not between time-multiplexed frames. It is thus 
not clear whether the partially-coherent holographic display mode 
enabled by the fast SLM speed is actually beneficial when combined 
with a limited precision of phase control or how it affects image 
quality. We propose an algorithmic CGH framework that robustly 
optimizes holograms in these mathematically challenging scenarios 
and explore the aforementioned tradeoff, demonstrating significant 
benefits in image quality and space—bandwidth utilization [Yoo et al. 
2021] of higher-speed phase SLMs. Moreover, we develop a learned 
propagation model that is more flexible than previously proposed 
alternatives in allowing us to calibrate it using 3D multiplane su- 
pervision but leverage a variety of target content, including 2D 
images, 2.5D RGBD images, 3D focal stacks, and 4D light fields, for 
supervision during runtime. 
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Specifically, our contributions include the following: 


e anew variant of a camera-calibrated wave propagation model 
for holographic displays, which is flexible in enabling runtime 
supervision by 2D, 2.5D, 3D, or 4D content; 

e a framework for robust CGH optimization with fast but heav- 
ily quantized phase-only SLMs; 

e experimental demonstration of improved image quality and 
better utilization of the SLM’s space—bandwidth product en- 
abled by our framework. 


Source code for this paper is available at computationalimaging.org. 


2 RELATED WORK 


Many aspects of holographic displays, including optics, SLMs, and 
algorithms, have advanced considerably over the last few years. 
Detailed discussions of many of these advancements can be found 
in the survey papers by Yaras [2010], Park [2017], and Chang et 
al. [2020]. A recent roadmap article by Javidi et al. [2021] also out- 
lines current and future research efforts of digital holography in 
non-display areas, including 3D imaging and microscopy. 

Our work primarily focuses on advancing the algorithms driv- 
ing holographic near-eye displays. In a nutshell, the CGH problem 
comprises several parts. First, the target content is specified in some 
format that needs to be converted to a complex-valued wavefield, 
such as point clouds [Fienup 1982; Gerchberg 1972; Maimone et al. 
2017; Shi et al. 2017, 2021], polygons [Chen and Wilkinson 2009; 
Matsushima and Nakahara 2009], light rays [Wakunami et al. 2013; 
Zhang et al. 2011], image layers [Chen et al. 2021; Chen and Chu 
2015; Zhang et al. 2017], or light fields [Benton 1983; Kang et al. 
2008; Lucente and Galyean 1995; Padmanaban et al. 2019; Ziegler 
et al. 2007]. Second, this wavefield needs to be encoded by a phase- 
only SLM, which can be achieved by fast, direct phase coding ap- 
proaches [Hsueh and Sawchuk 1978; Lee 1970; Maimone et al. 2017] 
or slow, iterative solvers, such as classic Gerchberg—Saxton-type 
algorithms [Fienup 1982; Gerchberg 1972] or variants of stochastic 
gradient descent [Chakravarthula et al. 2019; Peng et al. 2020]. 

Yet, the simulated wave propagation models used by most of these 
CGH algorithms do not always model the physical optics faithfully, 
thereby degrading image quality. Moreover, the computational com- 
plexity of these algorithms often prevents them from being practical 
in the power-constrained settings of a wearable computing sys- 
tem. Emerging artificial intelligence—driven CGH approaches have 
focused on addressing these limitations. For example, surrogate 
gradient methods that use a camera in the loop (CITL) for holo- 
gram optimization can significantly improve image quality [Choi 
et al. 2021b; Peng et al. 2021, 2020]. Alternatively, differentiable 
wave propagation models can be learned to calibrate for the gap 
between simulated models and physical optics [Chakravarthula 
et al. 2020; Choi et al. 2021a; Kavakli et al. 2022; Peng et al. 2020]. 
Moreover, neural networks can be trained to enable real-time CGH 
algorithms [Horisaki et al. 2021, 2018; Peng et al. 2020; Shi et al. 
2021). 

Note that our work is concurrently and independently developed 
from the very recent work by Lee et al. [2022]. Although both works 
share some similarity in applying constrained gradient descent meth- 
ods to optimize binary or heavily-quantized phase holograms, our 
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Fig. 2. Illustration of our calibrated wave propagation model and 2D/3D/4D supervision strategy for the multiplexed, quantized hologram generation. The 
complex-valued field at the SLM is adjusted by several learnable terms (amplitude and phase at the SLM plane as well as look-up table for phase mapping) 
and then processed by a CNN. The resulting complex-valued wave field is propagated to all target planes using the ASM wave propagation operator with two 
extra learnable terms (amplitude and phase at the Fourier domain). The wave fields at each target plane are processed again by smaller CNNs. The proposed 


framework applies to multiple input forms, including 2D, 2.5D, 3D, and 4D. 


framework outperforms the counterpart with the use of a learned 
propagation model for better image quality, the ability to effectively 
handle SLMs with varied bit depths and non-linear quantizations, 
and compatibility with a wide range of supervision sources. 


3. A FLEXIBLE FRAMEWORK FOR CGH 


In Fresnel holography, a collimated coherent light beam illuminates 
an SLM with a source field usyc, and the light reflected in response 
reproduces a target intensity distribution. To generate this hologram, 
a phase-only SLM imparts a spatially-varying delay ¢ on the phase of 
the field. After propagating a distance z from the SLM, the resulting 
complex-valued field uz is given by the following image formation 
model: 


Uz (x, Y, A) = f (Usim (x, Y, A), Z), 
Usin (X, y, A)= EF MPOYMy (x,y, A), (1) 


where J is the wavelength of light, x, y are the transverse coordi- 
nates, and us,y, is the modulated field at the SLM. The wave propaga- 
tion operator f models free-space propagation between two parallel 
planes separated by a distance z. For notational convenience, we 
will omit the dependence on x, y, A and the source field usrc. The 
intensity pattern generated by this display at distance z in front of 


. 2 
the SLM when showing phase ¢ is therefore | f (cia), z)| , 


When using low-bit SLMs for time-multiplexed holography, the 
effect of quantization is not negligible. To model a quantized phase- 
only SLM with M x N pixels, where every pixel offers phase control 
with limited precision, we define a quantization operator q: 


q: RMN 5 QYM*N, $+ o(¢) = TQ (9), (2) 


where II is the projection operator that maps the continuous phase 
value to the closest discrete phase in the feasible set Q supported 
by the SLM. 

Our framework approaches computer-generated holography with 
a differentiable camera-calibrated image formation model (Sec. 3.1), 
an optimization procedure designed for quantized SLMs (Sec. 3.2), 
and a family of loss functions supervised on either 2D, 2.5D, 3D, 
or 4D content to produce time-multiplexed holograms (Sec. 3.3). 
Figure 2 illustrates our model and optimization pipeline. 


3.1 Camera-calibrated Wave Propagation Model 


Recent work on holographic displays has demonstrated that the 
naive application of simulated wave propagation models, like the 
angular spectrum method (ASM) [Goodman 2014], to holographic 
displays fails to account for the non-idealities of the physical optical 
system, such as phase distortions of the SLM, optical aberrations, 
and the limited diffraction efficiency of the SLM [Chakravarthula 
et al. 2020; Choi et al. 2021a; Peng et al. 2020]. This discrepancy 
between simulated and physical image formation adversely affects 
image quality, but can be overcome by learning to calibrate for the 
physical optics using a differentiable, neural network—parameterized 
propagation model. 

Here, we propose a variant of the learned model recently proposed 
by Choi et al. [2021a]: 


_ 1 
fimodel(Ustm, Z) = CNNuaarget [Pasn (crs (asrce Pte us] ) z)| ) 


Pass ur2) = ff Fw) -H (fis fur dee) eM FFA Ad fd fy, 


H Riis: z) a pall FOR -Chy or) (3) 


where CNNgy and CNNiarger are convolutional neural networks that 
operate on the complex field at the SLM and target planes. The 
target plane is a distance z from the SLM. In addition, dsre and sre 
are learned to account for content-independent spatial variations in 
amplitude and phase of the incident source field at the SLM plane 
while ag and ¢¢ are added to the ASM propagation to learn spatial 
variations in amplitude and phase in the Fourier plane similarly to 
the learned complex convolutional kernel presented by Kavakli et al. 
[2022]. 

Similar to Choi et al., we capture a training and a test set com- 
prised of a large number of SLM phase patterns and corresponding 
amplitude images recorded at a set of distances {j},j =1...J with 
our prototype holographic display. Using a standard stochastic gra- 
dient descent-type solver, we then fit the parameters of the CNNs, 
CNNgm and CNNiager, aS Well as Asrc, AF; src, PF to learn the cali- 
brated wave propagation model. The model used in this framework 
builds upon the model from Choi et al. by using the terms dgyc, 
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Table 1. Comparison of different calibrated wave propagation models. All 
models are trained on 6 of the 7 planes. PSNR is evaluated for training and 
test sets as well as for the 7 held-out plane. The number of parameters of 
each model is also reported. Training details are listed in Supplement S2.4. 


Models | Params. Train Test Held-out 
NH [Peng et al. 2020] 41M 26.7 27.1 26.3 
NH3D [Choi et al. 2021a] 68.5M 34.4 32.4 31.9 
Our model, CNNs only 6.2M 31.6 29.7 30.0 

+ Asre 72M  35.3.—«35.4 32.3 

+ Agre + Psrc 8.2M 36.2 36.3 33.0 

i ee Sere 123M 365 36.4 32.8 

+ Bie bao Ge lUt 123M 36.4 36.4 32.8 

+ Asre + Pere + PF + AF + lut 16.4M 36.7 36.7 32.6 


Psrc, PF, and ag to learn many of the content-independent non- 
idealities of the holographic system. The source terms can efficiently 
model the effects of non-ideal illumination at the SLM plane, and the 
Fourier plane terms can compactly account for the effects of non- 
ideal optical filtering. Together these terms enable the use of smaller 
convolutional neural networks to learn the content-dependent non- 
idealities, such as the spatially varying pixel response at the SLM. 
Table 1 quantitatively assesses the effect of these physically-inspired 
parameters by evaluating the performance of different calibrated 
wave propagation models on a captured dataset. All models are 
trained over 6 intensity planes, corresponding to 0.0 D, 0.5 D, 1.0 D, 
1.5 D, 2.5 D, and 3.0 D in the physical space. A 7 plane at 2.0 D 
is set as the held-out plane for evaluation. In this table, we also 
ablate the performance of an additional /ut parameter to option- 
ally learn the feasible set Q of quantized values supported by the 
SLM. We observe that our model (bottom row) significantly reduces 
the number of parameters when compared to the original NH3D 
model, while still producing the highest PSNR metrics on the test 
set and the held-out plane. Notably, the lagging performance of the 
NH model, which is purely composed of physically-inspired terms, 
illustrates the substantial benefit of incorporating the flexibility of 
CNNs in a calibrated propagation model. Further details on our 
model architecture and training are included in Supplement S2.4 


3.2. Optimizing Phase Patterns for Quantized SLMs 


Emerging MEMS-based phase SLMs are fast but offer only a limited 
precision for controlling phase. DLP’s phase SLM by Texas Instru- 
ments (TI) [Bartlett et al. 2019], for example, runs at a maximum 
framerate of 1440 Hz grayscale but only offers 4 bits, or 16 discrete 
phase levels, at each of the frames. We therefore need to derive 
methods that allow us to optimize phase patterns for heavily quan- 
tized phase SLMs. The primary problem is that the quantization 
function g is not differentiable. To this end, we discuss and evalu- 
ate several strategies for dealing with g assuming some simple 2D 


loss function £ (s fant (cia, 0] 


desired 2D amplitude, and s is a scale parameter that is optimized 
along with ¢. 

The naive solution to dealing with q is to simply ignore it. Specif- 
ically, the phase pattern ¢ can be optimized given a 2D target ampli- 
tude image Gtarget and quantized to the available precision after the 





, atarget); where atarget is the 
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optimization. This is the approach typically adopted by state-of-the- 
art CGH algorithms that work well for liquid crystal-type phase 
SLMs, because these SLMs offer 8 bit or higher precision phase mod- 
ulation. TI's MEMS device enables time multiplexing but only offers 
4 bits, which makes this approach impractical (see Fig. 3). Instead, 
the reference code supplied with the SLM implements a variant of 
projected gradient descent [Boyd et al. 2004], which projects the 
iteratively updated solution onto the feasible set of quantized values 
Q. This approach is equivalent to a gradient descent—type update 
scheme that applies q after each iteration k as: 


gh) = ge = (4) c (s | Frode (eer) , atarget} ; 


” — Te (#”) = 4(), (4) 


As an alternative solution to solving these types of problems, 
surrogate gradient methods are often used [Bengio et al. 2013; Zenke 
and Ganguli 2018]. Here, the forward pass is computed using the 
correct quantization function g but during the error backpropagation 
pass, the gradients of a differentiable proxy function g are used. 
This enables improved optimization of phase patterns through a 
quantization layer with the minimal overhead of computing the 
proxy gradients: 


z aL aq\" iq( oe 
p< glk Daal 54) £ (5: Ln ( (oe) 








, ast : 


Perhaps the most common choice for qg is a sigmoid function, whose 
slope can be gradually annealed during training [Bengio et al. 2013; 
Chung et al. 2016; Zenke and Ganguli 2018]. 

We propose the use of a continuous relaxation of categorical vari- 
ables using Gumbel-Softmax [Jang et al. 2016; Maddison et al. 2016] 
for optimizing heavily quantized phase values in CGH applications. 
This approach has several desirable properties. First, the Gumbel 
noise and categorical relaxation prevent the optimization from get- 
ting stuck in local minima, which is perhaps the primary benefit 
over other surrogate gradient methods. Second, annealing of the 
temperature parameter Tt of the softmax as well as the shape of the 
score function are directly supported. Formally, this approach is 
written as: 


L 
G($) = | Q) - G; (score (,Q)), (6) 
[=1 


exp ((z; + gj) /T) (7) 
L 9 

doyey eXP ((Z1 + 91) /T) 

score; (¢,Q) =o (w- 46(g,Q)))\(1—-o(w- 6(g,Q)))), = (8) 


where gj ~ Gumbel (0,1) is the Gumbel noise for all of the / = 
1,...,L categories, i.e., quantized phase levels, o is a sigmoid func- 
tion, 6 is the signed angular difference, and w is a scale factor (see 
Jang et al. [2016] and the supplement for additional details). 


G) (z) = 


3.3. Runtime Supervision of Time-multiplexed Holograms 


Fast MEMS-based phase SLMs can produce higher-quality holo- 
grams through time multiplexing, i.e., intensity averaging of multi- 
ple frames. Given our camera-calibrated wave propagation model 


(Sec. 3.1), we optimize for time-multiplexed holograms using differ- 
ent target content at runtime. 


2D Holography. In this case, we wish to synthesize a 2D intensity 
image at a distance z in front of the phase SLM. The distance can 
be fixed or dynamically varied in software to enable a varifocal 
holographic display mode. For this purpose, we specify the loss: 





T 


1 
Lop = £\s 7D, 


2 





an (eia(o), z) 





» target |> (9) 


between the target amplitude image dtarget and the simulated holo- 
graphic image and solve for ¢. We can easily formulate a time- 
multiplexed variant of the CGH problem using this loss function 
by summing over t = 1...T squared amplitudes, i.e., intensities, 
where T refers to the total number of time-multiplexed frames that 
can be displayed throughout the exposure time of the human eye. 
The simplest example of the loss function £ is an f2 loss although 
other loss functions, such as perceptually motivated image quality 
metrics, could be applied as well. 


2.5D Holography using RGBD Input. Using the multiplane loss 
function presented by Choi et al. [2021a], holograms can be syn- 
thesized to generate a 2D set of intensities at depths specified by 
a depth map. We refer the interested reader to Supplement S2.5 
for the loss function and an additional discussion on utilizing time 
multiplexing to produce natural blur with 2.5D supervision. 


3D Multiplane Holography. True 3D holography can be achieved 
by optimizing a single SLM phase pattern ¢ or a series of time- 
multiplexed patterns ¢“) for the target amplitude of a focal stack 
fStarget- The corresponding loss function in our framework looks 
very similar to that of the 2D hologram above, although it is evalu- 
ated over the set of focal slices {j}: 





T 


ry 


t=1 





2 
, fStarget . (10) 
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Effectively optimizing this focal stack loss using the full blur avail- 
able within the diffraction angle of the SLM requires time multiplex- 
ing as illustrated in Supplement S2.6. 


4D Light Field Holography. Finally, we can also supervise our CGH 
framework using the amplitudes of a 4D target light field Iftarget. 
For this purpose, a differentiable hologram-to-light field transform 
is required, which can be calculated using the Short-time Fourier 
transform (STFT) [Padmanaban et al. 2019; Zhang and Levoy 2009]: 










‘9 


= sa (CFP), 
>; [StF (fu (ela z)| 


Lap = £\s T 
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, Iftarget ‘ (11) 





By utilizing time multiplexing, our optimized holograms can uniquely 
reproduce a set of light field views that fully covers the SLM’s space- 
bandwidth product as detailed in Supplement S2.7. 
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Fig. 3. Evaluation of CGH algorithms for fast, heavily quantized phase SLMs. 
We show simulations of 4 bit phase quantization with varying numbers 
of time-multiplexed frames, showing the average PSNR over 14 example 
images. The projected gradient descent (GD) improves upon the naive 
method, which ignores quantization. Surrogate gradient (SG) methods 
replace the gradients of the non-differentiable quantization operator in the 
backpropagation pass using either a sigmoid or a Gumbel-Softmax (GS) 
function. The latter is found to outperform other approaches by a large 
margin, especially with faster SLMs. Remarkably, our framework using 
only 4 bit precision with 8 time-multiplexed frames even outperforms a 
conventional 8 bit phase SLM without time multiplexing (red dashed line). 


- Green 
Iris — y| 


Fig. 4. Learned optical filters for three channels, corresponding to the am- 
plitude distribution on the Fourier plane a¢ that is indicated in Sec. 3.1 and 
Table 1. On the left we show the photograph of the physical iris used in the 
system acting as the optical filter. Our model accurately learns the shape of 
the physical iris and, as expected, its diameter in the learned model varies 





accordingly to wavelength. 


4 EXPERIMENTS 


To evaluate our novel algorithms, we use a benchtop 3D holographic 
display prototype. This prototype includes a FISBA RGBeam fiber- 
coupled module with red, green, and blue optically aligned laser 
diodes for illumination and a TI DLP6750Q1EVM phase SLM for 
high-speed quantized phase modulation. We capture the images 
produced by this prototype with a FLIR Grasshopper3 12.3 MP color 
USB3 sensor through a Canon EF 35mm lens with focus controlled 
by an Arduino microcontroller. Further details of the prototype are 
included in Supplement S1. 


Comparing CGH Algorithms. We compare several CGH approaches 
for the task of optimizing phase patterns for a fast phase SLM with 
4 bits, or 16 phase levels, in Fig. 3. The naive approach, which quan- 
tizes the phase after optimization performs poorly, as measured 
by the peak signal-to-noise ratio (PSNR). The projected gradient 
descent approach performs better and shows improvements with 
an increasing SLM speed. The surrogate gradient (SG) method used 
with the gradients of sigmoid and those of the Gumbel-Softmax 
are significantly better than other methods, with Gumbel-Softmax 
outperforming all other methods by a large margin, especially for 
higher-speed SLMs. This experiment represents the TI SLM with 
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Fig. 5. Comparison of 2D CGH algorithms using experimentally captured data. Here, we compare SGD algorithms using the ASM w/ Naive (1st column), 
Model w/ Naive (2nd column), and Model w/ GS without time multiplexing (3rd column) and with 8 multiplexed frames (4th column). Our calibrated wave 
propagation model and Gumbel-Softmax quantization layer result in sharper images with higher contrast and less speckle than others under the same 


experimental conditions. Quantitative evaluations are included as PSNR/SSIM. 


4 bits and up to 480 Hz color, i.e., 8 multiplexed frames each running 
at 60 Hz so a total of 480 Hz. We evaluate other bit depths in the 
supplement and show similar trends. Finally, Gumbel-Softmax can 


be used as part of an SG method (Eq. 5) using only its gradients o4 


or it can be used to replace q by g also in the forward image for- 
mation. We found the former performs better in most settings, and 
therefore only report these results in the paper; see the supplement 
for evaluations of the latter approach. 


Learning Physical Filters. We visualize in Figure 4 the performance 
of our learned model in accurately approximating the optical filter, 
which is an iris in the physical display system. As expected, values 
outside the filters are all zeros. The shape of blade edges is robustly 
learned with our model and scales with wavelength as expected. The 
variance of diameter size also aligns with the variance of wavelength. 
Refer to Figure S7 in the supplement for visualization of the full 
model. 


Assessing 2D Holography. We present in Figure 5 experimental re- 
sults of 2D holographic display assessing different CGH algorithms 
and different multiplexing schemes. In this experiment, we compare 
SGD algorithms using the ASM with Naive quantization, our model 
with Naive quantization, and our model with Gumbel-Softmax (GS). 
We observe two insights. First, the use of our calibrated wave prop- 
agation model corrects for most artifacts present in the physical 
display. Second, applying the GS operation leads to better perfor- 
mance in such heavily-quantized optimization problems. Refer also 
to Figures S8—9, as well as Tables S1 and S2 in the supplementary 
document for both quantitative and qualitative assessments of other 
examples. 


Assessing 3D Holography. We present in Figure 6 experimental 
results of 3D holographic display assessing different CGH algo- 
rithms. In this experiment, we compare SGD algorithms with the 
prior state-of-the-art NH3D model and Naive quantization using 
RGBD input [Choi et al. 2021a] with 1 frame and 8 multiplexed 
frames, respectively, our model with Gumbel-Softmax (GS), and 
our model with GS using focal stack supervision. PSNR metrics 
are provided in the caption. Using only a single frame results in 
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speckly in-focus content (shown with red squares in Figure 6). Even 
with multiple frames, RGBD supervision produces speckle in the 
unconstrained out-of-focus regions. However, with our focal stack 
supervision and time multiplexing, we observe natural out-of-focus 
blur, while still preserving sharpness for the in-focus content. For 
example, the branch at the intermediate depth is sharp, and the sky 
in the background is smooth. In the supplement, we show extensive 
evaluations and ablations of 3D multiplane CGH methods for more 
3D scenes (Figures S3—4 and S10-16). 


Assessing 4D Light Field Holography. We present in Figure 7 ex- 
perimental results of 4D light field—supervised holographic display, 
assessing different CGH algorithms. In this experiment, we com- 
pare the OLAS [Padmanaban et al. 2019] algorithm, our approach 
using light field-supervision with the ASM and naive quantization 
(ASM-Naive), and our approach with the camera-calibrated wave 
propagation model and Gumbel-Softmax (Model-GS) to account 
for the low bit depth of the SLM. The OLAS algorithm requires 
light field and depth maps for each light field view as input and it 
does not support time multiplexing. Both variants of our method 
do not require depth maps and jointly optimize 8 time-multiplexed 
frames using SGD. For each example scene, we show close-ups of 
content at two distances (far, near). We observe that our frame- 
work exhibits the best image quality for both in-focus (red squares) 
and out-of-focus regions (white squares). Refer also to Figures S5 
and S17 in the supplementary document for additional simulation 
and experimental results. 


5 DISCUSSION 


In summary, we present a new framework for computer-generated 
holography. This framework includes a camera-calibrated wave 
propagation model that combines parts of the recently proposed 
model in a novel way to achieve a better performance with fewer 
model parameters. We explore surrogate gradient methods for op- 
timizing the heavily quantized SLM patterns of emerging MEMS- 
based phase SLMs and show the Gumbel-Softmax algorithm to 
outperform other approaches. Our framework is flexible in sup- 
porting 2D, 2.5D, 3D, and 4D supervision at runtime and we show 
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Fig. 6. Comparison of 3D CGH algorithms using experimentally captured data. Here, we compare SGD algorithms with the prior state-of-the-art NH3D 
model and Naive quantization using RGBD input [Choi et al. 2021a] with 1 frame and 8 multiplexed frames, respectively, our model with Gumbel-Softmax 
(GS), and our model with GS using focal stack supervision. The corresponding PSNR metrics are 24.3 dB, 25.8 dB, and 26.7 dB with respect to the RGBD 
all-in-focus targets (left 3 columns), and 26.9 dB with respect to the focal stack (right column). For close-ups, red squares indicate where the camera is focused 
at three distances (from top to bottom: far, intermediate, and near). 


wakes 
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Fig. 7. Comparison of 4D light field-supervised CGH algorithms using experimentally captured data. Here, we compare the OLAS algorithm [Padmanaban 
et al. 2019] (1st column) without time multiplexing, and three variants of our approach: ASM-Naive without time multiplexing (2nd column) and with 8 
multiplexed frames (3rd column) and Model-GS with 8 multiplexed frames (4th column). For close-ups, red squares indicate where the camera is focused 
at two distances (top: far, bottom: near). Since OLAS deterministically computes a single phase pattern for a target light field, there would be no variation 
between time-multiplexed frames. 


state-of-the-art results in all of these scenarios with our near-eye phase SLMs and, importantly, by improving their diffraction effi- 
holographic display prototypes. ciency. In Figure S6 of our supplement, we explore the simulated 
image quality with varying levels of time multiplexing and bit depth, 

but analytically deriving this landscape remains an interesting di- 

Limitations and Future Work. Image quality could be further im- rection for future work to explore. Our algorithms do not run in real 
proved by increasing the precision and framerate of the employed time, but require on the order of tens of seconds to a few minutes 
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to compute a hologram. Neural networks could be employed to 
speed up the computation, as recently demonstrated by Horisaki et 
al. [2018], Peng et al. [2020], and Shi et al. [2021]. Due to their lim- 
ited space—bandwidth product, holographic near-eye displays only 
provide a limited eye box, which could be addressed by dynamically 
steering it using eye tracking [Jang et al. 2017]. The depth of field of 
3D-supervised holograms in AR scenarios should match that of the 
user's eye, which requires tracking their pupil diameter. Finally, we 
demonstrated our results on benchtop prototype displays, which 
will have to be miniaturized into the impressive device form factors 
presented by Maimone et al. [2017] and Wang and Maimone [2020]. 


Conclusion. The algorithmic advances presented in this work 
help make holographic near-eye displays a practical technology for 
next-generation VR/AR systems. 
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